Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Domain ontology driven approach for bidding webpage parsing

MA Dongxue, SONG She, XIE Zhenping, LIU Yuan

Journal of Computer Applications 2020, 40 (6): 1574-1579. DOI: 10.11772/j.issn.1001-9081.2019101792

Abstract （422）

PDF （3054KB）（517）

Save

In order to solve the low efficiency problem of parsing bidding webpages when using regular expression, a new automatic method was proposed based on bidding ontology model. Firstly, the structural features of bidding webpage texts were analyzed. Furthermore, a lightweight domain knowledge model on bidding ontology was constructed. Finally, a new algorithm for semantic matching and extraction of bidding webpage elements was introduced to realize the automatic parsing of bidding webpages. The experimental results show that, the accuracy and recall of the new method can reach 95.33% and 88.29% respectively by adaptive parsing. Compared with the regular expression method, the performance can be improved by 3.98 percentage points and 3.81 percentage points respectively. The proposed method can adaptively realize the structured parsing and extraction of semantic information in bidding webpages, and can satisfy the requirements of practical applications.

Reference | Related Articles | Metrics

Select

Efficient block-based sampling algorithm for aggregation query processing on duplicate charged records

PAN Mingyu, ZHANG Lu, LONG Guobiao, LI Xianglong, MA Dongxue, XU Liang

Journal of Computer Applications 2018, 38 (6): 1596-1600. DOI: 10.11772/j.issn.1001-9081.2017112632

Abstract （377）

PDF （982KB）（310）

Save

The existing query analysis methods usually treat the entity resolution as an offline preprocessing process to clean the whole data set. However, with the continuous increasing of data size, such offline cleaning mode with high computing complexity has been difficult to meet the needs of real-time analysis in most applications. In order to solve the problem of aggregation query on duplicate charged records, a new method integrating entity resolution with approximate aggregation query processing was proposed. Firstly, a block-based sampling strategy was adopted to collect samples. Then, an entity recognition method was used to identify the duplicate entities on the sampled samples. Finally, the unbiased estimation of aggregated results was reconstructed according to the results of entity recognition. The proposed method avoids the time cost of identifying all entities, and returns the query results that satisfy user needs by identifying only a small number of sample data. The experimental results on both real dataset and synthetic dataset demonstrate the efficiency and reliability of the proposed method.

Reference | Related Articles | Metrics